Goto

Collaborating Authors

 security measure


Servant, Stalker, Predator: How An Honest, Helpful, And Harmless (3H) Agent Unlocks Adversarial Skills

Noever, David

arXiv.org Artificial Intelligence

This paper identifies and analyzes a novel vulnerability class in Model Context Protocol (MCP) based agent systems. The attack chain describes and demonstrates how benign, individually authorized tasks can be orchestrated to produce harmful emergent behaviors. Through systematic analysis using the MITRE ATLAS framework, we demonstrate how 95 agents tested with access to multiple services-including browser automation, financial analysis, location tracking, and code deployment-can chain legitimate operations into sophisticated attack sequences that extend beyond the security boundaries of any individual service. These red team exercises survey whether current MCP architectures lack cross-domain security measures necessary to detect or prevent a large category of compositional attacks. We present empirical evidence of specific attack chains that achieve targeted harm through service orchestration, including data exfiltration, financial manipulation, and infrastructure compromise. These findings reveal that the fundamental security assumption of service isolation fails when agents can coordinate actions across multiple domains, creating an exponential attack surface that grows with each additional capability. This research provides a barebones experimental framework that evaluate not whether agents can complete MCP benchmark tasks, but what happens when they complete them too well and optimize across multiple services in ways that violate human expectations and safety constraints. We propose three concrete experimental directions using the existing MCP benchmark suite.


Alleviating Attack Data Scarcity: SCANIA's Experience Towards Enhancing In-Vehicle Cyber Security Measures

Sundfeldt, Frida, Widstam, Bianca, Moghadam, Mahshid Helali, Liang, Kuo-Yun, Vesterberg, Anders

arXiv.org Artificial Intelligence

The digital evolution of connected vehicles and the subsequent security risks emphasize the critical need for implementing in-vehicle cyber security measures such as intrusion detection and response systems. The continuous advancement of attack scenarios further highlights the need for adaptive detection mechanisms that can detect evolving, unknown, and complex threats. The effective use of ML-driven techniques can help address this challenge. However, constraints on implementing diverse attack scenarios on test vehicles due to safety, cost, and ethical considerations result in a scarcity of data representing attack scenarios. This limitation necessitates alternative efficient and effective methods for generating high-quality attack-representing data. This paper presents a context-aware attack data generator that generates attack inputs and corresponding in-vehicle network log, i.e., controller area network (CAN) log, representing various types of attack including denial of service (DoS), fuzzy, spoofing, suspension, and replay attacks. It utilizes parameterized attack models augmented with CAN message decoding and attack intensity adjustments to configure the attack scenarios with high similarity to real-world scenarios and promote variability. We evaluate the practicality of the generated attack-representing data within an intrusion detection system (IDS) case study, in which we develop and perform an empirical evaluation of two deep neural network IDS models using the generated data. In addition to the efficiency and scalability of the approach, the performance results of IDS models, high detection and classification capabilities, validate the consistency and effectiveness of the generated data as well. In this experience study, we also elaborate on the aspects influencing the fidelity of the data to real-world scenarios and provide insights into its application.


Malicious AI Models Undermine Software Supply-Chain Security

Communications of the ACM

Integrating malicious AI models6 into software supply chains presents a significant and emerging threat to cybersecurity. The attackers aim to embed malicious AI models in software components and widely used tools, thereby infiltrating systems at a foundational level. Once integrated, the malicious AI models execute embedded unauthorized code, which performs actions such as exfiltrating sensitive data, manipulating data integrity, or enabling unauthorized access to critical systems. Compromised development tools, tampered libraries, and pre-trained models are the primary methods of introducing malicious AI models into the software supply chain. Developers often rely on libraries and frameworks to import pre-trained AI models to expedite software development.


Quantifying Security Vulnerabilities: A Metric-Driven Security Analysis of Gaps in Current AI Standards

Madhavan, Keerthana, Yazdinejad, Abbas, Zarrinkalam, Fattane, Dehghantanha, Ali

arXiv.org Artificial Intelligence

As AI systems integrate into critical infrastructure, security gaps in AI compliance frameworks demand urgent attention. This paper audits and quantifies security risks in three major AI governance standards: NIST AI RMF 1.0, UK's AI and Data Protection Risk Toolkit, and the EU's ALTAI. Using a novel risk assessment methodology, we develop four key metrics: Risk Severity Index (RSI), Attack Potential Index (AVPI), Compliance-Security Gap Percentage (CSGP), and Root Cause Vulnerability Score (RCVS). Our analysis identifies 136 concerns across the frameworks, exposing significant gaps. NIST fails to address 69.23 percent of identified risks, ALTAI has the highest attack vector vulnerability (AVPI = 0.51) and the ICO Toolkit has the largest compliance-security gap, with 80.00 percent of high-risk concerns remaining unresolved. Root cause analysis highlights under-defined processes (ALTAI RCVS = 033) and weak implementation guidance (NIST and ICO RCVS = 0.25) as critical weaknesses. These findings emphasize the need for stronger, enforceable security controls in AI compliance. We offer targeted recommendations to enhance security posture and bridge the gap between compliance and real-world AI risks.


DIESEL -- Dynamic Inference-Guidance via Evasion of Semantic Embeddings in LLMs

Ganon, Ben, Zolfi, Alon, Hofman, Omer, Singh, Inderjeet, Kojima, Hisashi, Elovici, Yuval, Shabtai, Asaf

arXiv.org Artificial Intelligence

In recent years, conversational large language models (LLMs) have shown tremendous success in tasks such as casual conversation, question answering, and personalized dialogue, making significant advancements in domains like virtual assistance, social interaction, and online customer engagement. However, they often generate responses that are not aligned with human values (e.g., ethical standards, safety, or social norms), leading to potentially unsafe or inappropriate outputs. While several techniques have been proposed to address this problem, they come with a cost, requiring computationally expensive training or dramatically increasing the inference time. In this paper, we present DIESEL, a lightweight inference guidance technique that can be seamlessly integrated into any autoregressive LLM to semantically filter undesired concepts from the response. DIESEL can function either as a standalone safeguard or as an additional layer of defense, enhancing response safety by reranking the LLM's proposed tokens based on their similarity to predefined negative concepts in the latent space. This approach provides an efficient and effective solution for maintaining alignment with human values. Our evaluation demonstrates DIESEL's effectiveness on state-of-the-art conversational models (e.g., Llama 3), even in challenging jailbreaking scenarios that test the limits of response safety. We further show that DIESEL can be generalized to use cases other than safety, providing a versatile solution for general-purpose response filtering with minimal computational overhead.


What AI evaluations for preventing catastrophic risks can and cannot do

Barnett, Peter, Thiergart, Lisa

arXiv.org Artificial Intelligence

AI evaluations are an important component of the AI governance toolkit, underlying current approaches to safety cases for preventing catastrophic risks. Our paper examines what these evaluations can and cannot tell us. Evaluations can establish lower bounds on AI capabilities and assess certain misuse risks given sufficient effort from evaluators. Unfortunately, evaluations face fundamental limitations that cannot be overcome within the current paradigm. These include an inability to establish upper bounds on capabilities, reliably forecast future model capabilities, or robustly assess risks from autonomous AI systems. This means that while evaluations are valuable tools, we should not rely on them as our main way of ensuring AI systems are safe. We conclude with recommendations for incremental improvements to frontier AI safety, while acknowledging these fundamental limitations remain unsolved.


Security Threats in Agentic AI System

Khan, Raihan, Sarkar, Sayak, Mahata, Sainik Kumar, Jose, Edwin

arXiv.org Artificial Intelligence

Artificial Intelligence (AI) agents have become increasingly prevalent in various applications, from virtual assistants to complex data analysis systems. However, their direct access to databases raises significant concerns regarding privacy and security. This paper examines these critical issues, focusing on the potential risks posed by unrestricted AI access to sensitive data. The rapid advancement of AI technologies has resulted in systems capable of processing vast amounts of data and generating human-like responses. While this progress has provided numerous benefits, it has also introduced new challenges in ensuring data privacy and security. AI agents with direct access to databases may inadvertently expose confidential information, or they may be exploited by malicious actors to access or manipulate sensitive data. Additionally, AI systems' ability to analyze large datasets increases the risk of unintended privacy violations, making them prime targets for attacks aimed at extracting or misusing data. This paper explores the current landscape of AI agent interactions with databases and analyzes the associated risks. It discusses the potential threats to privacy protection and data security as AI agents become more integrated into various applications.


Large Language Models have Intrinsic Self-Correction Ability

Liu, Dancheng, Nassereldine, Amir, Yang, Ziming, Xu, Chenhui, Hu, Yuting, Li, Jiajie, Kumar, Utkarsh, Lee, Changjae, Xiong, Jinjun

arXiv.org Artificial Intelligence

Large language models (LLMs) have attracted significant attention for their remarkable abilities in various natural language processing tasks, but they suffer from hallucinations that will cause performance degradation. One promising solution to improve the LLMs' performance is to ask LLMs to revise their answer after generation, a technique known as self-correction. Among the two types of self-correction, intrinsic self-correction is considered a promising direction because it does not utilize external knowledge. However, recent works doubt the validity of LLM's ability to conduct intrinsic self-correction. In this paper, we present a novel perspective on the intrinsic self-correction capabilities of LLMs through theoretical analyses and empirical experiments. In addition, we identify two critical factors for successful self-correction: zero temperature and fair prompts. Leveraging these factors, we demonstrate that intrinsic self-correction ability is exhibited across multiple existing LLMs. Our findings offer insights into the fundamental theories underlying the self-correction behavior of LLMs and remark on the importance of unbiased prompts and zero temperature settings in harnessing their full potential.


Towards Sustainable IoT: Challenges, Solutions, and Future Directions for Device Longevity

Shirvani, Ghazaleh, Ghasemshirazi, Saeid

arXiv.org Artificial Intelligence

In an era dominated by the Internet of Things, ensuring the longevity and sustainability of IoT devices has emerged as a pressing concern. This study explores the various complex difficulties which contributed to the early decommissioning of IoT devices and suggests methods to improve their lifespan management. By examining factors such as security vulnerabilities, user awareness gaps, and the influence of fashion-driven technology trends, the paper underscores the need for legislative interventions, consumer education, and industry accountability. Additionally, it explores innovative approaches to improving IoT longevity, including the integration of sustainability considerations into architectural design through requirements engineering methodologies. Furthermore, the paper discusses the potential of distributed ledger technology, or blockchain, to promote transparent and decentralized processes for device provisioning and tracking. This study promotes a sustainable IoT ecosystem by integrating technology innovation, legal change, and social awareness to reduce environmental impact and enhance resilience for the digital future


Cross-Task Defense: Instruction-Tuning LLMs for Content Safety

Fu, Yu, Xiao, Wen, Chen, Jia, Li, Jiachen, Papalexakis, Evangelos, Chien, Aichi, Dong, Yue

arXiv.org Artificial Intelligence

Recent studies reveal that Large Language Models (LLMs) face challenges in balancing safety with utility, particularly when processing long texts for NLP tasks like summarization and translation. Despite defenses against malicious short questions, the ability of LLMs to safely handle dangerous long content, such as manuals teaching illicit activities, remains unclear. Our work aims to develop robust defenses for LLMs in processing malicious documents alongside benign NLP task queries. We introduce a defense dataset comprised of safety-related examples and propose single-task and mixed-task losses for instruction tuning. Our empirical results demonstrate that LLMs can significantly enhance their capacity to safely manage dangerous content with appropriate instruction tuning. Additionally, strengthening the defenses of tasks most susceptible to misuse is effective in protecting LLMs against processing harmful information. We also observe that trade-offs between utility and safety exist in defense strategies, where Llama2, utilizing our proposed approach, displays a significantly better balance compared to Llama1.